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Abstract —We consider multi-hop wireless networks 
serving multiple flows in which only packets that meet hard 
end-to-end deadline constraints are useful, i.e., if a packet 
is not delivered to its destination node by its deadline, 
it is dropped from the network. We design decentralized 
scheduling policies for such multi-hop networks that attain 
the maximum throughput of useful packets. The resulting 
policy is decentralized in the sense that in order to 
make a transmission decision, a node only needs to know 
the “time-till-deadline” of the packets that are currently 
present at that node, and not the state of the entire 
network. The key to obtaining an easy-to-implement and 
highly decentralized policy is to replace the hard constraint 
on the number of simultaneous packet transmissions that 
can take place on the outgoing links of a node, by a time- 
average constraint on the number of transmissions. The 
policy thus obtained is guaranteed to provide maximum 
throughput. Analysis can be extended to the case of time- 
vaiying channel conditions in a straightforward manner. 

Simulations showing significant improvement over exist¬ 
ing policies for deadline based scheduling, such as Earliest 
Deadline First, and supporting the theory, are presented. 

I. Introduction 

The focus of this paper is on multi-hop networks 
with per-packet end-to-end deadlines. Scheduling poli¬ 
cies such as the hack-pressure algorithm [1], which are 
designed to achieve maximum throughput for traditional 
multi-hop wireless networks, provide guarantees only 
on average end-to-end delays, not per-packet delays. 
Due to the unreliable nature of the wireless medium, 
the per-packet delay along sample paths can become 
arbitrarily large. In-fact the back pressure algorithm or 
the MaxWeight scheduler has been shown to optimize 
the average end-to-end network delay only in the heavy- 
traffic regime [2]. However since the average delay 
grows roughly as ^ traffic intensity ’ delay becomes 

unbounded as traffic intensity approaches 1. This is 
not desirable for applications such as cyber-physical 
systems where control-loops are closed over networks 
and system stability is sensitive to delays. 

In this paper, we consider multi-hop wireless networks 
serving various flows, in which if a packet is not deliv¬ 
ered to its destination by its deadline, it is dropped from 
the network and not counted in the throughput. Since 
the wireless channel is unreliable, the outcome of packet 
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transmissions is modeled as random processes. We sup¬ 
pose that each node can transmit multiple packets on its 
out-links. To incorporate an average power constraint, 
which presents itself as an average rate constraint on 
the out-going communication channels linked to that 
node, we impose an upper bound on the average number 
of transmissions that can be made by a node in the 
network, a quantity which is allowed to depend on the 
individual node. Furthermore we assume that a node 
can transmit and receive packets on multiple channels, 
a property which can be achieved via various techniques 
such as TDMA, CDMA, OFDMA [3]-[5] which enable 
multiple access. We remark that such an assumption 
allows the network to make full use of the available 
resource-sharing disciplines. 

The throughput of a flow is then the average number 
of packets delivered to its destination node per unit time, 
and our goal is to design decentralized scheduling poli¬ 
cies that maximize the total throughput of the network. 

We use the scalarization principle [6], and consider 
a weighted throughput of all flows, whose maximum 
value will be one point on the Pareto frontier of the rate 
region. We pose the problem of obtaining a (weighted) 
throughput-maximizing scheduling policy as a Markov 
Decision Process (MDP). 

Our approach to solving this problem is via the La- 
grangian dual of this MDP. The Lagrange multipliers 
associated with the rate-constraints are interpreted as 
prices paid by a packet to a node for transmitting 
its packet. This renders our approach very different 
from that leading to the backpressure policy, where the 
Lagrange multipliers are associated with flows and not 
packets, and are therefore completely different, corre¬ 
sponding to queue lengths. 

The resulting overall MDP decomposes conveniently 
into a “unit-packet unit flow” MDP (Section VI). This 
makes possible a decentralized packet-by-packet solu¬ 
tion, where not just flows or nodes but individual pack¬ 
ets can be individually optimized with respect to their 
treatment by nodes. A node only needs to know the 
remaining lifetime till deadline of each packet that is 
present at the node and makes a decison on transmitting 
the packet based on that. Moreover the packet level MDP 
does not suffer from a severe curse of dimensionality 
since its state space is only the cardinality of the number 
of nodes multiplied by the relative deadline bound on 
the packet, and is thus relatively easily solved. Thus, in¬ 
troduction of Lagrange multipliers, specifically prices for 



attempting packet transmissions, gives rise to a tractable 
and easy to implement decentralized scheduling policy. 
These Lagrange multipliers are shown to be computable 
in a decentralized online fashion. 

The key to our results is the long-term average con¬ 
straint on the number of concurrent packet transmis¬ 
sions by a node. This can regarded as a long-term av¬ 
erage relaxation of a more stringent hard constraint on 
the number of concurrent transmissions that a node may 
make at any time. This relaxation is in the same spirit 
as Whittle’s relaxation for multi-armed bandits [7], [8], 
where the constraint on the number of arms that can 
be simultaneously pulled is relaxed to a constraint on 
the average number of arms that can be pulled. Our 
relaxation therefore also has an asymptotic optimality 
property in the same manner that Whittle’s relaxation 
has. In Whittle’s case it is asymptotically optimal as 
the number of arms goes to infinity. Our relaxation is 
asymptotically optimal as the link capacities are scaled 
proportionally across the entire network, and results 
from the fact that on an average, not much is gained in 
terms of the total reward collection, by relaxing the hard 
constraint to an average constraint [9]. The analysis can 
be carried out using the theory of large deviations for 
Markov processes, and it can be shown that the sub¬ 
optimality gap between the relaxed policy, and the opti¬ 
mal policy, is of the order . ^ Extensive 
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simulations have been performed which show that the 
policy thus obtained drastically out-performs existing 
policies such as Back Pressure, Earliest Deadline Eirst, 
and Debt-Based policies. 

The system model considered excludes a key aspect 
of wireless networks, namely wireless interference. It 
could be incorporated by not allowing interfering links 
to schedule packet transmissions simultaneously. This 
would require the co ordinated effort of all the nodes in 
order to decide the optimal set of non-interfering links 
to be activated, and is the subject of future works. 

II. System Model 

We consider wireless networks in which the data- 
packets have a hard deadline constraint on the time 
at which they are delivered to their destination nodes 
in order to be regarded as useful and counted in the 
throughput. The network comprises of several nodes 
1,2,...,!^ that are connected via wireless links. The 
wireless network is described by a directed graph in 
which there is a directed edge i —> j if node i can 
transmit packets to node j. 

We assume that time is discrete, and evolves over 

discrete time slots numbered 1,2,_ One time-slot is 

the time taken to attempt a packet transmission over any 
link in the wireless network. The network is shared by F 
flows fi, f2, ■ ■ ■, fp- Each flow fi has a pre-determined 
route comprised of a sequence of links that connect the 



Fig. 1. Multi-hop network serving three flows /i on the route si —>■ 
b —>■ di, /2 on the route S 2 —>■ si —>■ ci 2 , and /s on the route S 3 —>■ 
b ds. 


source node of the flow fi to its destination node; see 
Eigure 1. 

The wireless channel between any two nodes is al¬ 
lowed to be random. If a packet of any flow / is 
attempted on the wireless link I, then the transmission 
is successful with probability p;. The outcomes of packet 
transmission attempts are independent across links and 
time-slots. 

Each node v has an average rate-constraint My, which 
is the maximum number of packet transmission attempts 
per unit time that it can make. We will use the notation 
I £ V to mean that link I is in the set of out-links from 
node V, and thus v can use the link I for transmitting 
packets. Let Uj{t) be the random variable which denotes 
the number of packets of flow / that are attempted for 
transmission on link I at time t. The rate constraints are 
given by, 

limsupE (< My, (1) 

T-^oo \ ^ / 

Note that we allow a node to transmit and receive pack¬ 
ets simultaneously over several outgoing links, which is 
possible via various techniques such as TDMA, OEDMA, 
CDMA etc. [3]-[5], and hence My can be larger than 
1. The rate-constraint is due to the fact that wireless 
nodes have power constraints, which in turn induces 
constraints on communication. 

Each packet that is generated by the network has a 
“relative-deadline”, and if the packet is not delivered to 
its destination within this deadline, it is dropped from 
the network and will not be transmitted in future time- 
slots. More precisely, if a packet has a relative-deadline 
6, and is generated at the beginning of time-slot t, then 
either it is delivered to its destination node by time-slot 
T + S, or it is discarded from the network. 

We assume that the relative deadlines of the packets 
are i.i.d. and bounded by a fixed A, where the distribu¬ 
tion of the relative deadlines depends on the flow that 
the packet belongs to. The relative deadline becomes 
known to the network as soon as the packet is generated. 




Our analysis can be extended in a straight-forward 
manner to consider the case when the relative-deadline 
of a packet is an arbitrary stochastic process that is 
adapted, though we eschew that here for brevity. When 
so chosen as an adapted stochastic process, some very 
useful models can be captured. For example, suppose 
we have a video context, and we have a frame buffer 
at the receiver, then the relative deadline is equal to 
the “remaining play time” left in the frame buffer since 
we don’t want the buffer emptied. In that case Relative 
Deadline = — (Elapsed time since the Last time that 
Destination Buffer was empty, i.e., the current age of 
the “busy epoch”) -I- (Number of packets that arrived at 
the Destination since then) x (Time to play one packet). 
Note that in this case the deadline process depends on 
the policy being used. 

We assume that the inter-arrival times of packets at 
different source nodes for each flow are governed by 
renewal processes having finite means. The throughput 
attained by a flow / under a policy is defined to be, 

( 2 ) 

where the random variable df{t) is = 1 if a packet 
of flow / is delivered to its destination at time t, and 
is = 0 otherwise, and expectation is under the policy 
that is applied. A throughput vector q that can be 
achieved via some scheduling policy will be called an 
“achievable throughput vector”, the set of all achievable 
throughput vectors constitutes the “rate-region”, and a 
scheduling policy that achieves the rate-region is said to 
be throughput-optimal. 

Of course, all the above definitions depend on the 
random process which decides the relative-deadlines of 
the packets, and so therefore does the rate-region. Thus 
we might call such networks as “deadline-constrained 
networks”. 

Vectors will be in bold font, and by we refer to N 
dimensional vectors which are non-negative component 
wise. 


111. Previous Works 

References [10]-[12] consider a network model in 
which multiple flows share a single-hop network, and 
all the packets across every flow have the same relative 
deadline. Restricting consideration to a periodic arrival 
process with relative deadline equal to the period at 
any given time at most, only one packet belonging to 
each flow is present in the network. References [13], 
[14] consider a similar one-hop network model and 
characterize the throughput maximizing policy. Clearly 
the single-hop model is restrictive. 

Reference [15] considers the challenging problem of 
scheduling deadline-constrained packets over a multi¬ 
hop network, but the proposed policies are not shown to 
have any provable guarantees on the resulting through¬ 
put. To the best of the author’s knowledge, [16] is the 


only work which provides a provable sub-optimal policy 
for deadline-constrained networks, though it only con¬ 
cerns wired networks. Moreover the policies proposed 
in [16] guarantee only a fraction 

1 

length of the longest route in network 

of the maximum possible throughput, i.e., only a small 
fraction of the capacity region. 

Though the MaxWeight policy [2] has been popular 
among researchers in designing multi-hop scheduling¬ 
routing policies such as the Back-Pressure [1], or decen¬ 
tralized CSMA wireless scheduling policies [17], [18], it 
should be noted that an application of the MaxWeight 
principle to design multi-hop scheduling policies for 
wireless networks results in the number of possible 
switch vectors available to the scheduler [2] growing 
roughly exponentially in the quantities: total number 
of flows, sum of the relative deadlines of flows, and 
the sum of the route lengths of flows. While some of 
the previous works on deadline-constrained scheduling 
[10]-[12] have been successful in designing deadline- 
constrained throughput optimal “debt-based” policies for 
single-hop networks, any generalization to the case of 
multi-hop wireless network seems difficult. 

Joint routing scheduling policies designed for multi 
hop networks, such as the Backpressure [1] are through¬ 
put optimal, but perform poorly with respect to the 
end-to-end delay [19]-[22]. References [23] and [24] 
perform a fluctuation analysis of the “packet starvation” 
that occurs in deadline constrained networks, while 
[25]-[27] develop a framework for service regularity for 
networks shared by multiple flows. 

Thus, presently the knowledge of provably opti¬ 
mal scheduling policies for deadline-constrained multi¬ 
hop wireless networks is severely limited. Any optimal 
scheduling policy needs to take into account how much 
time each packet has spent in the network, as well as the 
channel reliabilities of the links that the packets have to 
traverse in order to reach the destination node. 

IV. Characterizing the Rate Region 

The rate-region of the network (defined in Section II) 
will be denoted by A . We note that in order to charac¬ 
terize it, it is sufficient to characterize the set of Pareto- 
optimal vectors q € A, defined as 
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such that q S argmax 

yeA 


f 


since A is simply its closed convex hull. Thus the 
problem of obtaining the set A is equivalent to that 
of finding scheduling policies which maximize a non- 
negatively weighted sum of throughputs. 



A. Constrained MDP Formulation 

The problem of maximizing a non-negatively 
weighted sum of throughputs subject to rate-constraints 
can be posed as a Constrained Markov Decision Process 
(CMDP) [28]. The system state can be described as 
follows. At time t, each packet in the network belonging 
to flow / is described by the two tuple {l,s), where I 
is the link at which the packet is present, and s is the 
time-to-go till its deadline. The state of the network is 
then simply given by the states of each packet present 
in the network. 

Since the number of packets in the network at any 
time is bounded (because the relative deadlines are 
bounded only a bounded number of packets can hope 
to be served in this time period and others can be safely 
dropped), the system state x{t) takes on finitely many 
values. A scheduling policy tt has to choose, for each 
time t, at each node, which packets to transmit from the 
set of packets available to it. The probability distribution 
of the system state at time t+1, x{t+l), depends only on 
the system state at time t, x{t) and the action chosen at 
time t by the policy tt. The problem of maximizing the 
throughput subject to node-capacity constraints ( 1 ) is 
posed as a Constrained Markov Decision Process, where 
a reward of a/ is received when a packet of flow / is 
delivered to its destination. 

Thus a policy maximizing the network throughput 
solves the following optimization problem: 


Hence using Lemma 1 to compute A is out of the 
question. Thus we seek to design low-complexity decen¬ 
tralized scheduling policies that achieve the region A. 

V. The Dual MDP 

We write the Lagrangian for the Primal MDP (3) as, 

C{7r,X) = 

+ '^\v inf y E (4) 

V \ \t=l f lev J J 

+ 'y ^ XyMy. 

V 

The corresponding dual is, 

D{\) = max£( 7 r, A). (5) 

TT 

Next we develop a useful upper-bound on D{X), the 
proof of which follows from the sub-addivity of lim inf 
operation and super-additivity of lim sup operation. 
Lemma 2: 

i?(A) <^L7(A)+^A„M„, (6) 

/ V 

where 
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Vf{X) := 

max lim sup —E 


lim sup —E 
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We note that the above CMDP, parameterized by the vec¬ 
tor a := («!,..., Of) is solved by a Stationary Random¬ 
ized Policy ( [28]). Since the state-space of the network, 
and the number of link-capacity constraints ( 1 ) are 
finite, it follows that there is a finite set {tti, 7 r 2 ,..., ttm} 
of Stationary Randomized Policies such that for each 
value of a, there is a policy that belongs to this set 
and solves the CMDP (3) ( [28]). Let 71 , 72 ,... , 7 m be 
the vectors of throughputs associated with the policies 
7 ri, 7 r 2 ,... jTtm. We then have the following characteri¬ 
zation of A. 

Lemma 1: 


A= <^q 


M 

q = ^7^c^ 


a > 0,^Ci 


< 1 


Note that the number of Stationary Markov policies is 
exponentially large in the parameter: 
maximum possible number of packets in the network x 
maximum path length of the flows x maximum possible 
relative deadline. 


and TT/ is a policy that schedules packets of flow /. 

VI. The Single Flow, Single Packet Problem 

Consider the expression in the r.h.s. of ( 6 ). We note 
that the introduction of the vector A while considering 
the dual problem (5) decouples the problem (3) into F 
number of single-flow problems, wherein if a packet of 
flow / is present at node v at time t, and it is attempted 
on a link I that belongs to the set of out going links of 
node V, then flow / is charged a price of amount A;. 
Thus At, can be interpreted as the price that the node v 
charges for each use of any of its out-links. 

We note that the above Lagrange multipliers are very 
different from the Lagrange multipliers employed in 
deriving the Backpressure policy. There the Lagrange 
multipliers correspond to queue lengths, whereas here, 
as we will show below, they are the price paid by a 
packet to a node for the privilege of being transmitted. 
It is this difference that will allow us to decompose the 
overall problem on packet-by-packet basis. It is this that 
will allow us to obtain a decentralized policy that is not 
only decentralized across flows, but also over nodes, and 
in fact over packets, with not even any coupling between 
packets at the same node. 



We will first solve these F single-flow problems. Then 
we will show that the policy that implements the solu¬ 
tion to the single-flow problem for each flow solves the 
relaxed problem. Denote by tt/ a policy for flow /. The 
policy TTf knows the value of the node prices A„ of all 
the nodes v that lie on the route of flow /, and the state 
of all the packets in the network that belong to flow f, 
and has to make a decision whether or not to schedule 
a packet transmission for flow / at a link I, but does not 
need to keep track of the state of other flows. 

The single-flow problem, parametrized by the vector 
of prices A is to find the policy tt/ that maximizes, 

1 / 

Vf{X) = inaxlimsup ^ E ^ I ^ (^afd^j{t) + 

f T-s-oo Y j 

(7) 
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where uf(t) assumes the value 1 if the j-th packet 
belonging to the flow / is served at link I in time-slot 
t, and 0 otherwise, assumes the value 1 if the j-th 
packet of flow / is delivered to its destination at time 
t, and Nf{t) is the number of packets for flow / that 
arrive into the network by time t. Also note that uf{t) 
can assume the value 1 only if the j-th packet of flow / 
is present at link I at time t. 

Note that while introducing the single-flow problem, 
we have reduced it to a single-flow single-packet problem, 
i.e. the total reward earned under the application of 
a policy tt/ is the sum of the rewards it earns from 
each packet. Thus the policy that solves the single-slow 
problem makes a decision whether or not to schedule a 
packet for flow f depending only on the state of that 
single packet! Thus the introduction of node-prices A 
not only decouples the original problem into F separate 
problems, but further separates the problem for a single 
flow into that involving only a particular packet of that 
flow. 

Next we describe and solve the single-packet-single- 
flow problem. We note that the state of a packet is much 
smaller than the state of the entire network. The state 
of a packet only consists of the node it is at, and its time 
till deadline. Therefore the Markov Decision Process for 
the single packet problem is much more tractable than 
the MDP for the overall system. To make the discussion 
simpler, since we are dealing with a single-flow, we will 
assume that the nodes have been re-labelled so that the 
source node is labelled as 1, while the destination node 
is L -I- 1. With this notation in place, the link between 
nodes i and z -P 1 is labelled as f (so that h is the z-th 
link on the route). The single flow under consideration 
has to be served using the route ii, ( 2 , • ■ •, II- A single 
packet for this flow is generated at time t = 1 at the 
source node 1, and needs to be delivered to the node 
L + 1 by the time D -f 1. The wireless link (channel) l 
has a channel reliability p/. Thus the packet has a relative 


deadline of D time-slots, and has to traverse L hops in 
order to reach the destination. 

If the packet is delivered in a time-slot t G 
{l,2,...,iA -P 1}, then a reward of a is received and 
the packet leaves the system. However if the packet is 
not delivered by the time D -P 1, it is removed from the 
system without generating a reward of a units. If the 
packet is present at node v and it is attempted, then it 
is charged an instantaneous price of A„. The total reward 
accrued by a packet is any rewards it obtains when it 
reaches its destination on time, minus all the prices it 
paid at all the links along its path to the destination. 

The policy tt/ has to decide at each time whether 
or not to schedule the packet’s transmission, so as to 
maximize the total expected reward earned in the time 

The single-packet single-flow problem is clearly a 
finite-state stochastic dynamic programming problem 
wherein the state of the packet assumes values in 

{((i, s) : 0 < z < L and L — i<s<D — i}, 

where U is the link at which the packet is present and s 
is the “time remaining till deadline”. 

Let V^{l,s) be the maximum expected reward that 
can be earned by the packet starting in state {l,s). The 
function V^{1, s) is not to be confused with the function 
Vf{X) defined for the single-flow problem in (7). The 
super scipt shows the dependence of the value function 
V^{l,s) on the price vector A. At times, the superscript 
A will be omitted to make the notation simpler. 

Clearly V{li,L — i) = 0 for z = 1,2,...,L, since a 
packet at link k needs at least L—i time-slots to reach the 
destination node, but since the time-to-go till deadline is 
L —z, it is dropped from the network. Using the principle 
of optimality, the values V{l,s) are computed as, 

V{li,s) = ma.x{V{li,s - 1), 

- + PhV{h+i,s - 1) -I- (1 - pif)V{li,s - 1)}, 

(8) 

where the second expression within the braces corre¬ 
sponds to the rewards associated with scheduling the 
packet transmission in state {li,s). The first expression 
is the reward under the choice of not scheduling, and 
the optimal action is the one that achieves the maximum 
on the rhs. The computations in (8) are performed for 
all states in the order 

(II, 1)), {II, 2),..., {Il- 1 , 2),..., and thus the optimal 
actions and value function are obtained in LD steps. 

Let (j)fll,s) = 0 if the first term in the rhs of (8) is 
strictly larger, and = 1 if the second term is strictly 
larger. For values of states {l,s) in which both the terms 
in the rhs of (8) are equal, set (j)f{l,s) to be any value in 
the set [0,1]. Let be the vector consisting of the values 
4>f{l, s). Denote by 7r/(A, (j}f ) the policy which transmits 
the packet of flow / with a probability (j)f(l,s) when the 
packet is in state {l, s). By construction, 7r/(A, </>/) solves 
the single-flow-single-packet problem. 



Thus the optimal policy 7r/(A, ^/) can be parameter¬ 
ized by the vector (/>/. Since the total rewards earned 
in the Single Flow Problem (7) is the sum of rewards 
earned from each packet, we have, 

Lemma 3: Policies Trf{\,cf>f ) solve the Single Flow 
Problem (7). 

It should be noted that the optimal policy depends 
on the price vector A. The key to the dramatic de¬ 
composition of an originally very complex problem of 
optimizing the behavior of the entire network lies in the 
fundamental nature of the constraint: It is an average 
constraint on number of packets. One can regard this 
as a relaxation of a hard constraint on the number of 
concurrent packet transmissions allowed to a node, a 
relaxation that has an asymptotic optimality property. To 
see this, an analogy with the multi-arm bandit problem 
where one is allowed up to n simultaneous pulls per play 
is revealing. For this problem, there is no Gittins Index; 
however Whittle [8] has shown that if this constraint 
is relaxed to an average constraint of no more than n 
pulls, then one can obtain a decomposition. Moreover 
this constraint is optimal in the limit as the number 
of arms of different types and arm pulls are increased 
in proportion to infinity. In our case too, as the link 
capacities are scaled proportionally across the entire 
network we get asymptotic optimality. 

VII. Solution of Primal MDP 

Given price vector A and vector cf) = {<t>f,f = 
1,2,..., F}, let us denote by 7r(A, 0) the policy that 
jointly follows the rule 7r/(A, (j}f ) for each flow /, where 
TTf{X, (j}f ) is as defined in Section VI. First, we show that 
there exist values of the vectors A, cj) such that the policy 
7r/(A, 0y) solves the Primal MDP. 

Lemma 4: D{X) = C{tt{X, cf)),X). 

Proof: From the definition of the dual function (5) 
and the function Vf (7), it follows that, 

/:(7r,A) < D{X) < XyMy + 'y ^ XyMy, 

f y y 

for any policy tt. However since the policies 7r(A, 0) are 
Stationary Randomized Markov policies (all the liminfs 
and lim sups in the definition of its Lagrangian and also 
the corresponding rewards associated with the single¬ 
flow problem change to lim), it follows from Lemma 3 
that Vf is the value of £(7r, A) evaluated at the 
policies 7r(A, d>), and thus the inequality above becomes 
an equality, 

L{n{X,(l)),X)=D{X) = Y,Vf + Y,XvMy. (9) 

/ « 

Theorem 1: Consider the Primal MDP (3) and its 
associated dual problem defined in (5). There exists a 
price-vector A*, and vectors 4>fjf = 1)2, ...,F, such 
that (7r(A*, (/)*), A*) is an optimal primal-dual pair, and 
thus the policy 7r(A*,d>*) solves the relaxed problem. 


Proof: We use the ergodic control approach devel¬ 
oped in [28]-[30], particularly useful for constrained 
MDPs, wherein an average-cost MDP can be viewed 
as a linear programming problem. More precisely, the 
infinite-horizon average-reward problem can be posed 
as that of optimizing a linear cost function over a 
convex set after one considers occupation measures of 
the combined state-cum-control Markov process. If 
is the occupation measure induced under the policy tt, 
the problem reduces to maximizing 


E 


reward (state,control);/’^ (state,control). 


state,control 


We further note that the constraints in the Primal MDP 
are linear functions of and hence the set over which 
optimization is to be carried out is convex. 

Thus if we show that there exists a policy such 
that the constraints in the Primal MDP (3) hold with 
strict inequality “>” rather than “>”, then the proof of 
the Lemma would follow from Slater’s condition [31]. 
But then the policy which never schedules any packet 
regardless of the value of the system state satisfies 
the constraints with strict > ( the parameters are 
assumed to be > 0, i.e. the total capacity of all the out- 
links associated with each node v is assumed to be > 0.) 

We note that the policy 7r(A*, cf*) is decentralized, and 
each node only needs the knowledge of the time-till- 
deadlines of the packets it has. 


A Obtaining A*, 0* 

The policy obtained above is decentralized and easy to 
implement; one has only to obtain the price vector A* 
that solves the dual problem (5). Next we provide an 
iterative algorithm to find A* that can be implemented 
in a decentralized manner, i.e., the nodes need only local 
information of the total link usage on all of its out- 
links. Thus the nodes need not communicate amongst 
themselves to obtain global information about the state 
of the network in order to derive A*, which rules out 
involvement of any communication over heads. 

Since the dual function D{\) is convex, we can use the 
sub-gradient method [32] to iteratively find A*. From 
Lemma 4, we have D{X) = £(7r(A, 0), A) -I- '^^XyMy, 
where the value of the Lagrangian £(7r(A, </>), A) does 
not depend on the probability with which the packet 
is scheduled for values of the states in which the ac¬ 
tive (scheduling) and passive (not scheduling) actions 
yield equal rewards. Thus in order to find the value of 
dual function D{X), it suffices to fix the randomizing 
probability to be 0 for any value of the state in which 
the active and passive actions yield equal rewards. We 
can choose as sub-gradients the gradient associated with 
the policy 7r(A, </>), i.e., the quantities ^ ^ _ 

1,2,.. .,V. Next we provide the expression for the gra¬ 
dients ^ y = 1,2,... ,V, which helps us in 

O'Ay 



interpreting it as congestion at the nodes, and the node¬ 
prices as a means to control the congestion. 

From the description of the Single Flow Problem (7), 
it follows that the total cost a packet pays due to the link- 
price At, is equal to A„ times the total number of times 
that it uses an out-link at node v. Thus letting f{f,v) to 
be a random variable that has the same distribution as 
the total link-usage per-unit time-slot by packets of flow 
/ at node v, we have from (6), 

dD 

( 10 ) 

U Ay j 

The quantity f (/, v) is the total link-usage at 

node V, and thus measures the congestion at node v. 
The price A^ is thus a means to prevent congestion at 
node V. 

The iterations are given by 

=A'=-afc/, (11) 

where is the sub-gradient evaluated at A^ as in (10). 
Since the value of the i-th component of sub-gradient 
as provided in equation (10) is a local information, 
meaning that node i can calculate its total link-usage 
without resorting to any communication with other 
nodes, these iterations can be performed locally. The 



Fig. 2. Multi-hop network serving three flows /i on the route si —>■ 
b ^ c —>■ di, /2 on the route S 2 ^ si d 2 , and /s on the route 
S3 c ^ d^,. 

analysis here can be extended to cover the case of 
time-varying channels so as to incorporate the wireless 
fading. This is accomplished by appending the state with 
the state of the wireless channels across the network. 

VIII. Simulations 

Apart from the scheduling policy deployed, we 
note that for a deadline-constrained network, the net 
throughput depends upon the following factors: rela¬ 
tive deadlines of the packets, the random process that 
decides the number of the packet arrivals in different 
flows, the channel reliabilities of various links and the 
node capacities MyS. 

We compare the optimal policy derived in the paper 
with the policy that implements the EDF scheduling rule 
at each node in the network. The EDF policy is known 
to be optimal in the case of single-hop network. 


For the network shown in Figure 2, the channel 
reliabilities of each link are fixed at 1, except for the 
link Si —> d 2 , which is fixed at 0.5. Packets belonging to 
flow /i have a relative deadline of 3 time slots, while 
packets belonging to flows / 2 , and /a have a relative 
deadline of 2 time slots. Moreover it is assumed that 
each flow receives 50 packets in each time slot. The node 
capacities of each node except the node si are assumed 
to be 100 packets/slot. The throughputs of all the flows 
are weighted equally, i.e., ai = a 2 = as = 1. Figure 3 
compares the throughput attained by the optimal policy 
with that attained by the EDF policy as a function of 
capacity of node si. 

IX. Conclusions 

We have posed the problem of designing throughput- 
maximizing scheduling policies for deadline-constrained 
multi-hop wireless networks as a Constrained Markov 
Decision Process (CMDP). Though a first look at the 
complex nature of the problem suggests that the optimal 
policy should make a scheduling decision based on the 
entire state of the network, and thus suggests that each 
node in the network must know the state of the network, 
we have however solved the dual of the original CMDP 
and thereby derived an optimal policy that is highly 
decentralized. In order to implement this optimal policy, 
each node in the network needs to know only the time- 
till-deadline of each packet present with it. 

Furthermore, the problem of obtaining the optimal 
Lagrange multipliers or the node prices has also been 
shown to have a decentralized solution, in which each 
node monitors its own congestion and updates the prices 
accordingly. The node-prices have been shown to be a 
means of congestion control. Simulations show that the 
optimal policy indeed out performs the Earliest Deadline 
Eirst policy, comporting with the theory. 
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